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Abstract — 

The recent growth of the Internet and its increased het- 
erogeneity have increased the complexity of networlc proto- 
col design and testing. In addition, the advent of multipoint 
(multicast-based) applications has introduced new challenges 
that are qualitatively different in nature than the traditional 
point-to-point protocols. Multipoint applications typically in- 
volve a group of participants simultaneously, and hence are 
inherently more complex. As more multipoint protocols are 
coming to life, the need for a systematic method to study 
and evaluate such protocols is becoming more apparent. Such 
method aims to expedite the protocol development cycle and 
improve protocol robustness and performance. 

In this paper, we present a new methodology for develop- 
ing systematic and automatic test generation algorithms for 
multipoint protocols. These algorithms attempt to synthesize 
network topologies and sequences of events that stress the 
protocol's correctness or performance. This problem can be 
viewed as a domain-specific search problem that suffers from 
the state space explosion problem. One goal of this work is to 
circumvent the state space explosion problem utilizing knowl- 
edge of network and fault modeling, and multipoint protocols. 
The two approaches investigated in this study are based on for- 
ward and backward search techniques. We use an extended 
finite state machine (FSM) model of the protocol. The first 
algorithm uses forward search to perform reduced reachabil- 
ity analysis. Using domain-specific information for multicast 
routing over LANs, the algorithm complexity is reduced from 
exponential to polynomial in the number of routers. This ap- 
proach, however, does not fully automate topology synthesis. 
The second algorithm, the fault-oriented test generation, uses 
backward search for topology synthesis and uses backtracking 
to generate event sequences instead of searching forward from 
initial states. 

Using these algorithms, we have conducted studies for cor- 
rectness of the multicast routing protocol PIM. We propose to 
extend these algorithms to study end-to-end multipoint pro- 
tocols using a virtual LAN that represents delays of the un- 
derlying multicast distribution tree. 

I. Introduction 

Network protocols are becoming more complex with the 
exponential growth of the Internet, and the introduction of 
new services at the network, transport and application lev- 
els. In particular, the advent of IP multicast and the MBone 
enabled applications ranging from multi-player games to dis- 
tance learning and teleconferencing, among others. To date, 
little effort has been exerted to formulate systematic meth- 
ods and tools that aid in the design and characterization of 
these protocols. 

In addition, researchers are observing new and obscure, yet 
all too frequent, failure modes over the internets jl| Such 
failures are becoming more frequent, mainly due to the in- 
creased heterogeneity of technologies, interconnects and con- 
figuration of various network components. Due to the syn- 
ergy and interaction between different network protocols and 
components, errors at one layer may lead to failures at other 



layers of the protocol stack. Furthermore, degraded perfor- 
mance of low level network protocols may have ripple effects 
on end-to-end protocols and applications. 

Network protocol errors are often detected by application 
failure or performance degradation. Such errors are hardest 
to diagnose when the behavior is unexpected or unfamiliar. 
Even if a protocol is proven to be correct in isolation, its 
behavior may be unpredictable in an operational network, 
where interaction with other protocols and the presence of 
failures may affect its operation. Protocol errors may be 
very costly to repair if discovered after deployment. Hence, 
endeavors should be made to capture protocol flaws early in 
the design cycle before deployment. To provide an effective 
solution to the above problems, we present a framework for 
the systematic design and testing of multicast protocols. The 
framework integrates test generation algorithms with simu- 
lation and implementation. We propose a suite of practical 
methods and tools for automatic test generation for network 
protocols. 

Many researchers ^] have developed protocol verifica- 
tion methods to ensure certain properties of protocols, like 
freedom from deadlocks or unspecified receptions. Much of 
this work, however, was based on assumptions about the net- 
work conditions, that may not always hold in today's Inter- 
net, and hence may become invalid. Other approaches, such 
as reachability analysis, attempt to check the protocol state 
space, and generally suffer from the 'state explosion' problem. 
This problem is exacerbated with the increased complexity of 
the protocol. Much of the previous work on protocol verifica- 
tion targets correctness. We target protocol performance and 
robustness in the presence of network failures. In addition, 
we provide new methods for studying multicast protocols and 
topology synthesis that previous works do not provide. 

We investigate two approaches for test generation. The 
first approach, called the fault-independent test generation, 
uses a forward search algorithm to explore a subset of the 
protocol state space to generate the test events automatically. 
State and fault equivalence relations are used in this approach 
to reduce the state space. The second approach is called 
the fault-oriented test generation, and uses a mix of forward 
and backward search techniques to synthesize test events and 
topologies automatically. 

We have applied these methods to multicast routing. Our 
case studies revealed several design errors, for which we have 
formulated solutions with the aid of this systematic process. 

We further suggest an extension of the model to include 
end-to-end delays using the notion of virtual LAN. Such ex- 
tension, in conjunction with the fault-oriented test genera- 
tion, can be used for performance evaluation of end-to-end 
multipoint protocols. 

The rest of this document is organized as follows. Sec- 
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Fig. 1 

Establishing multicast delivery tree 



Multicast distribution trees may be established by either 
broadcast-and-prune or explicit join protocols. In the former, 
such as DVMRP or PIM-DM, a multicast packet is broadcast 
to all leaf subnetworks. Subnetworks with no local members 
for the group send prune messages towards the source(s) of 
the packets to stop further broadcasts. Link state protocols, 
such as MOSPF, broadcast membership information to all 
nodes. In contrast, in explicit join protocols, such as CBT 
or PIM-SM, routers send hop-by-hop join messages for the 
groups and sources for which they have local members. 
We conduct robustness case studies for PIM-DM. We are par- 
ticularly interested in multicast routing protocols, because 
they are vulnerable to failure modes, such as selective loss, 
that have not been traditionally studied in the area of pro- 
tocol design. 

For most multicast protocols, when routers are connected via 
a multi-access network (or LAN)^, hop-by-hop messages are 
multicast on the LAN, and may experience selective loss; i.e. 
may be received by some nodes but not others. The likeli- 
hood of selective loss is increased by the fact that LANs often 

^Wc include appendices for completeness. 

*^Wc use the term LAN to designate a connected network with respect to 
IP-multicast. This includes shared media {such as Ethernet, or FDDI), hubs, 
switches, etc. 



contain hubs, bridges, switches, and other network devices. 
Selective loss may affect protocol robustness. 
Similarly, end-to-end multicast protocols and applications 
must deal with situations of selective loss. This differentiates 
these applications most clearly from their unicast counter- 
parts, and raises interesting robustness questions. 
Our case studies illustrate why selective loss should be con- 
sidered when evaluating protocol robustness. This lesson is 
likely to extend to the design of higher layer protocols that 
operate on top of multicast and can have similar selective 
loss. 

II. Framework Overview 

Protocols may be evaluated for correctness or performance. 
We refer to correctness studies that are conducted in the ab- 
sence of network failures as verification. In contrast, robust- 
ness studies consider the presence of network failures (such 
as packet loss or crashes). In general, the robustness of a 
protocol is its ability to respond correctly in the face of net- 
work component failures and packet loss. This work presents 
a methodology for studying and evaluating multicast proto- 
cols, specifically addressing robustness and performance is- 
sues. We propose a framework that integrates automatic test 
generation as a basic component for protocol design, along 
with protocol modeling, simulation and implementation test- 
ing. The major contribution of this work lies in developing 
new methods for generating stress test scenarios that target 
robustness and correctness violation, or worst case perfor- 
mance. 

Instead of studying protocol behavior in isolation, we in- 
corporate the protocol model with network dynamics and 
failures in order to reveal more realistic behavior of protocols 
in operation. 

This section presents an overview of the framework and its 
constituent components. The model used to represent the 
protocol and the system is presented along with definitions 
of the terms used. 

Our framework integrates test generation with simulation 
and implementation code. It is used for Systematic Testing 
of Robustness by Jivaluation of Synthesized Scenarios 
(STRESS). As the name implies, systematic methods for sce- 
nario synthesis are a core part of the framework. We use 
the term scenarios to denote the test-suite consisting of the 
topology and events. 

The input to this framework is the specification of a pro- 
tocol, and a definition of its design requirements, in terms of 
correctness or performance. Usually robustness is defined in 
terms of network dynamics or fault models. A fault model 
represents various component faults; such as packet loss, cor- 
ruption, re-ordering, or machine crashes. The desired output 
is a set of test-suites that stress the protocol mechanisms 
according to the robustness criteria. 

As shown in Figure ^, the STRESS framework includes 
test generation, detailed simulation driven by the synthesized 
tests, and protocol implementation driven through an emu- 
lation interface to the simulator. In this work we focus on 
the test generation (TG) component. 



Fig. 2 

The stress framework 



A. Test Generation 

The core contribution of our work lies in the development 
of systematic test generation algorithms for protocol robust- 
ness. We investigate two such algorithms, each using a dif- 
ferent approach. 

In general test generation may be random or deterministic. 
Generation of random tests is simple but a large set of tests 
is needed to achieve a high measure of error coverage. Deter- 
ministic test generation (TG), on the other hand, produces 
tests based on a model of the protocol. The knowledge built 
into the protocol model enables the production of shorter 
and higher- quality test sequences. Deterministic TG can be: 
a) fault-independent, or b) fault-oriented. Fault-independent 
TG works without targeting individual faults as defined by 
the fault model. Such an approach may employ a forward 
search technique to inspect the protocol state space (or an 
equivalent subset thereof) , after integrating the fault into the 
protocol model. In this sense, it may be considered a variant 
of reachability analysis. We use the notion of equivalence 
to reduce the search complexity. Section ^ describes our 
fault-independent approach. 

In contrast, fault-oriented tests are generated for specified 
faults. Fault-oriented test generation starts from the fault 
(e.g. a lost message) and synthesizes the necessary topology 
and sequence of events that trigger the error. This algorithm 
uses a mix of forward and backward searches. We present 
our fault-oriented algorithm in Section 0. 

We conduct case studies for the multicast routing proto- 
col PIM-DM to illustrate differences between the approaches, 
and provide a basis for comparison. 

In the remainder of this section, we describe the system 
model and definition. 



B. The system model 

We define our target system in terms of network and topol- 
ogy elements and a fault model. 

B.l Elements of the network 

Elements of the network consist of multicast capable nodes 
and bi-directional symmetric links. Nodes run same multi- 
cast routing, but not necessarily the same unicast routing. 
The topology is an A^-router LAN modeled at the network 
level; we do not model the MAC layer. 

For end-to-end performance evaluation, the multicast dis- 
tribution tree is abstracted out as delays between end systems 
and patterns of loss for the multicast messages. Cascade of 
LANs or uniform topologies are addressed in future research. 

B. 2 The fauh model 

We distinguish between the terms error and fault. An error 
is a failure of the protocol as defined in the protocol design 
requirement and specification. For example, duplication in 
packet delivery is an error for multicast routing. A fault is 
a low level (e.g. physical layer) anomalous behavior, that 
may affect the behavior of the protocol under test. Note 
that a fault may not necessarily be an error for the low level 
protocol. 

The fault model may include: (a) Loss of packets, such 
as packet loss due to congestion or link failures. We take 
into consideration selective packet loss, where a multicast 
packet may be received by some members of the group but 
not others, (b) Loss of state, such as multicast and/or unicast 
routing tables, due to machine crashes or insufficient mem- 
ory resources, (c) The delay model, such as transmission, 
propagation, or queuing delays. For end-to-end multicast 
protocols, the delays are those of the multicast distribution 
tree and depend upon the multicast routing protocol, and 
(d) Unicast routing anomalies, such as route inconsistencies, 
oscillations or flapping. 

Usually, a fault model is defined in conjunction with the 
robustness criteria for the protocol under study. For our 
robustness studies we study PIM. The designing robustness 
goal for PIM is to be able to recover gracefully (i.e. with- 
out going into erroneous stable states) from single protocol 
message loss. That is, being robust to a single message loss 
implies that transitions cause the protocol to move from one 
correct stable state to another, even in the presence of se- 
lective message loss. In addition, we study PIM protocol 
behavior in presence of crashes and route inconsistencies. 

C. Test Sequence Definition 

A fault model may include a single fault or multiple faults. 
For our robustness studies we adopt a single-fault model, 
where only a single fault may occur during a scenario or a 
test sequence. 

We define two sequences, T —< ei, 62, . . . , €„ > and T' =< 
ei, e2, . . . , ej, /, Cfc, . . . , e„ >, where d is an event and / is a 
fault. Let P{q,T) be the sequence of states and stimuli of 
protocol P under test T starting from the initial state q. 
T' is a test sequence if final P{q, T') is incorrect; i.e. the 
stable state reached after the occurrence of the fault does 
not satisfy the protocol correctness conditions (see Section |l^ 
P[) irrespective of P{q,T). In case of a fault-free sequence, 



The correct function of a multicast routing protocol in gen- 
eral, is to deliver data from senders to group members (only 
those that have joined the group) without any data loss. For 
our methods, we only assume that a correctness definition 
is given by the protocol designer or specification. For illus- 
tration, we discuss the protocol errors and the correctness 
conditions. 
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Fig. 3 

Test pattern dimensions 



The events are actions performed by the host and act as 
input to the system; for example, join, leave, or send packet. 
The topology is the routed topology of set of nodes and links. 
The nodes run the set of protocols under test or other sup- 
porting protocols. The links can be either point-to-point 
links or LANs. This model may be extended later to repre- 
sent various delays and bandwidths between pairs of nodes, 
by using a virtual LAN matrix (see [^). The fault model 
used to inject the fault into the test. According to our single- 
message loss model, for example, a fault may denote the 'loss 
of the second message of type prune traversing a certain link'. 
Knowing the location and the triggering action of the fault 
is important in analyzing the protocol behavior. 

E. Brief description of PIM-DM 

For our robustness studies, we apply our automatic test 
generation algorithms to a version of the Protocol Indepen- 
dent Multicast-Dense Mode, or PIM-DM. The description 
given here is useful for Sections ^ through 

PIM-DM uses broadcast-and-prune to establish the multi- 
cast distribution trees. In this mode of operation, a multicast 
packet is broadcast to all leaf subnetworks. Subnetworks with 
no local members send prune messages towards the source(s) 
of the packets to stop further broadcasts. 

Routers with new members joining the group trigger Graft 
messages towards previously pruned sources to re-establish 
the branches of the delivery tree. Graft messages are ac- 
knowledged explicitly at each hop using the Graft- Ack mes- 
sage. 

PIM-DM uses the underlying unicast routing tables to get 
the next-hop information needed for the RPF (reverse-path- 
forwarding) checks. This may lead to situations where there 
are multiple forwarders for a LAN. The Assert mechanism 
prevents these situations and ensures there is at most one 
forwarder for a LAN. 



E.l PIM Protocol Errors 

In this study we target protocol design and specification 
errors. We are interested mainly in erroneous stable (i.e. 
non-transient) states. In general, the protocol errors may 
be defined in terms of the end-to-end behavior as functional 
correctness requirements. In our case, for PIM-DM, an error 
may manifest itself in one of the following ways: 

1) black holes: consecutive packet loss between periods of 
packet delivery, 2) packet looping: the same packet traverses 
the same set of links multiple times, 3) packet duplication: 
multiple copies of the same packet are received by the same 
receiver(s), 4) join latency: lack of packet delivery after a 
receiver joins the group, 5) leave latency: unnecessary packet 
delivery after a receiver leaves the group |^, and 6) wasted 
bandwidth: unnecessary packet delivery to network links that 
do not lead to group members. 

E.2 Correctness Conditions 

We assume that correctness conditions are provided by the 
protocol designer or the protocol specification. These condi- 
tions are necessary to avoid the above protocol errors in a 
LAN environment, and include 

1. If one (or more) of the routers is expecting to receive pack- 
ets from the LAN, then one other router must be a forwarder 
for the LAN. Violation of this condition may lead to data loss 
(e.g. join latency or black holes). 

2. The LAN must have at most one forwarder at a time. Vi- 
olation of this condition may lead to data packet duplication. 

3. The delivery tree must be loop-free: 

(a) Any router should accept packets from one incoming in- 
terface only for each routing entry. This condition is enforced 
by the RPF (Reverse Path Forwarding) check. 

(b) The underlying unicast topology should be loop-free |^. 
Violation of this condition may lead to data packet looping. 

4. If one of the routers is a forwarder for the LAN, then there 
must be at least one router expecting packets from the LANs. 
Violation of this condition may lead to leave latency. 

III. Search-based Approaches 

The problem of test synthesis can be viewed as a search 
problem. By searching the possible sequences of events and 

■^Join and leave latencies may be considered in other contexts as perfor- 
mance issues. However, in our study we treat them as errors. 

'^These arc the correctness conditions for stable states; i.e. not during 
transients, and arc defined in terms of protocol states (as opposed to end 
point behavior). 

The mapping from functional correctness requirements for multicast rout- 
ing to the definition in terms of the protocol model is currently done by the 
designer. The automation of this process is part of future research. 

^Some esoteric scenarios of route flapping may lead to multicast loops, in 
spite of RPF chec]<s. Currently, our study does not address this issue, as it 
does not pertain to a localized behavior. 
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faults over network topologies and checking for design re- 
quirements (either correctness or performance), we can con- 
struct the test scenarios that stress the protocol. However, 
due to the state space explosion, techniques must be used 
to reduce the complexity of the space to be searched. We 
attempt to use these techniques to achieve high test quality 
and protocol coverage. 

Following we present the GFSM model for the case study 
protocol (PIM-DM), and use it as an illustrative example 
to analyze the complexity of the state space and the search 
problem, as well as illustrate the algorithmic details and prin- 
ciples involved in FITG and FOTG. 

A. The Protocol Model 



B.l.a System States (<S). Possible states in which a router 
may exist are: 



State Symbol 


Meaning 


Fi 


Router i is a forwarder for the LAN 




i forwarder with Timer Timer running 


NFi 


Upstream router i a non- forwarder 


NHi 


Router i has the LAN as its next-hop 




same as NHi with Timer Timer running 


NCi 


Router i has a negative-cache entry 


EUi 


Upstream router i is empty 


EDi 


Downstream router i is empty 


Mi 


Downstream router with attached member 


NMi 


Downstream router with no members 



The possible states for upstream and downstream routers 
are as follows: 



We represent the protocol as a finite state machine (FSM) 
and the overall LAN system by a global FSM (GFSM). 

/. FSM model: Every instance of the protocol, running on 
a single router, is modeled by a deterministic FSM consist- 
ing of: (i) a set of states, (ii) a set of stimuli causing state 
transitions, and (iii) a state transition function (or table) de- 
scribing the state transition rules. For a system i, this is 
represented by the machine Mi = {S,Ti,5i), where <S is a 
finite set of state symbols, n is the set of stimuli, and Si is 
the state transition function S x n ^ S. 

II. Global FSM model: The global state is defined as the 
composition of individual router states. The output mes- 
sages from one router may become input messages to other 
routers. Such interaction is captured by the GFSM model in 
the global transition table. The behavior of a system with n 
routers may be described by Mg — {Sg,Tg,Sg), where Sg: 

n 

iSi X <S2 X ■ ■ ■ X iSn is the global state space, rg: |J is the 

1=1 

set of stimuli, and Sg is the global state transition function 
Sg X Tg ^ Sg. 

The fault model is integrated into the GFSM model. For 
message loss, the transition caused by the message is either 
nullified or modified, depending on the selective loss pattern. 
Crashes may be treated as stimuli causing the routers affected 
by the crash to transit into a crashed state ^. Network de- 
lays are modeled ( when needed) through the delay matrix 
presented in Section VII. 



5. = 



r {Fi, FijTimer, NFi, EUi}, 

if the router is upstream; 

{NH, , NH,_T^mer , NC, , M, , N M, ,ED,}, 

if the router is downstream. 



B. PIM-DM Model 

Following is the model of a simplified version of PIM-DM. 

B.l FSM model M, = {Si,Ti,5,) 

For a given group and a given source (i.e., for a specific 
source-group pair), we define the states w.r.t. a specific LAN 
to which the router Ri is attached. For example, a state 
may indicate that a router is a forwarder for (or a receiver 
expecting packets from) the LAN. 



The crashed state maybe one of the states already defined for the pro- 
tocol, like the empty state, or may be a new state that was not defined 
previously for the protocol. 



B.l.b Stimuli (r). The stimuli considered here include 
transmitting and receiving protocol messages, timer events, 
and external host events. Only stimuli leading to change 
of state are considered. For example, transmitting messages 
per se (vs. receiving messages) does not cause any change of 
state, except for the Graft, in which case the Rtx timer is 
set. Following are the stimuli considered in our study: 

1. Transmitting messages: Graft transmission {Graftrx). 

2. Receiving messages: Graft reception {Graftjicv), Join 
reception (Join), Prune reception (Prune), Graft Acknowl- 
edgement reception (GAck), Assert reception (Assert), and 
forwarded packets reception (FPkt). 

3. Timer events: these events occur due to timer expiration 
(Exp) and include the Graft re-transmission timer (Rtx), 
the event of its expiration (RtxExp), the forwarder-deletion 
timer (Del), and the event of its expiration (DelExp). We 
refer to the event of timer expiration as (Timer Implication). 

4. External host events (Ext): include host sending pack- 
ets (SPkt), host joining a group (HJoin or HJ), and host 
leaving a group (Leave or L). 

T — {Join, Prune, Graftxx, Graftjicv, GAck, Assert, 
FPkt, Rtx, Del, SPkt, HJ, L}. 

B.2 Global FSM model 

Subscripts are added to distinguish difi'erent routers. 
These subscripts are used to describe router semantics and 
how routers interact on a LAN. An example global state for 
a topology of 4 routers connected to a LAN, with router 1 
as a forwarder, router 2 expecting packets from the LAN, 
and routers 3 and 4 have negative caches, is given by 
{Fi, NII2, NCs, NCi}. For the global stimuli rg, subscripts 
are added to stimuli to denote their originators and recipi- 
ents (if any) . The global transition rules Sg are extended to 
encompass the router and stimuli subscripts 

^Semantics of the global stimuli and global transitions will be described 
as needed {see Section [v|) . 



6 



C. Defining stable states 

We are concerned with stable state (i.e. non-transient) be- 
havior, defined in this section. To obtain erroneous stable 
states, we need to define the transition mechanisms between 
such states. We introduce the concept of transition classifi- 
cation and completion to distinguish between transient and 
stable states. 

C.l Classification of Transitions 

We identify two types of transitions; externally triggered 
(ET) and internally triggered (IT) transitions. The former is 
stimulated by events external to the system (e.g., HJoin or 
Leave), whereas the latter is stimulated by events internal to 
the system (e.g., FPkt or Graft). 

We note that some transitions may be triggered due to ei- 
ther internal and external events, depending on the scenario. 
For example, a Prune may be triggered due to forwarding 
packets by an upstream router FPkt (which is an internal 
event), or a Leave (which is an external event). 

A global state is checked for correctness at the end of an 
externally triggered transition after completing its dependent 
internally triggered transitions. 

Following is a table of host events, their dependent ET and 
IT events: 



Host Events 


SPkt 


H Join 


Leave 


ET events 


FPkt 


Graft 


Prune 


IT events 


Assert, Prune, 
Join 


GAck 


Join 



C. 2 Transition Completion 

To check for the global system correctness, all stimulated 
internal transitions should be completed, to bring the system 
into a stable state. Intermediate (transient) states should 
not be checked for correctness (since they may temporarily 
seem to violate the correctness conditions set forth for sta- 
ble states, and hence may give false error indication). The 
process of identifying complete transitions depends on the 
nature of the protocol. But, in general, we may identify a 
complete transition sequence, as the sequence of (all) transi- 
tions triggered due to a single external stimulus (e.g., HJoin 
or Leave). Therefore, we should be able to identify a tran- 
sition based upon its stimuli (either external or internal). 
At the end of each complete transition sequence the system 
exists in either a correct or erroneous stable state. Event- 
triggered timers (e.g., Del, Rtx) fire at the end of a complete 
transition. 

D. Problem Complexity 

The problem of finding test scenarios leading to proto- 
col error can be viewed as a search problem of the protocol 
state space. Conventional reachability analysis attempts 
to investigate this space exhaustively and incurs the 'state 
space explosion' problem. To circumvent this problem we 



use search reduction techniques using domain-specific infor- 
mation of multicast routing. 

In this section, we give the complexity of exhaustive search, 
then discuss the reduction techniques we employ based on 
notion of equivalence, and give the complexity of the state 
space. 

D.l Complexity of exhaustive search 

Exhaustive search attempts to generate all states reachable 
from initial system states. For a system of n routers where 
each router may exist in any state Si € S, and |>S| = s states, 
the number of reachable states in the system is bounded by 
(s)". With / possible transitions we need I ■ (s)" state visits 
to investigate all transitions. Faults, such as message loss 
and crashes, increase the branching factor and may intro- 
duce new states increasing S. For our case study \S\ — 10, 
while selective loss and crashes ^ increase branching almost 
by factor of 9. 

D.2 State reduction through equivalence 

Exhaustive search has exponential complexity. To reduce 
this complexity we use the notion of equivalence. Intuitively, 
in multicast routing the order in which the states are consid- 
ered is irrelevant (e.g., if router Ri or 7?4 is a forwarder is 
insignificant, so long as there is only one forwarder). Hence, 
we can treat the global state as an unordered set of state 
symbols. This concept is called 'counting equivalence' ^. By 
definition, the notion of equivalence implies that by investi- 
gating the equivalent subspace we can test for protocol cor- 
rectness. That is, if the equivalent subspace is verified to be 
correct then the protocol is correct, and if there is an error in 
the protocol then it must exist in the equivalent subspace 

D.2. a Symbolic representation. We use a symbolic rep- 
resentation as a convenient form of representing the global 
state to illustrate the notion of equivalence and to help in 
defining the error and correct states in a succinct manner. 
In the symbolic representation, r routers in state q are rep- 
resented by . The global state for a system of n routers 

^Crashes force any state to the empty state. 

^Two system states (qi , g2 i ■ ■ ■ > Qn) and (pi, P2 i ■ ■ ■ ! Pn) are strietly 
equivalent iff ~ Pi. where qi,Pi £ 5, VI < i < n. However, all 
routers use the same deterministic FSM model, hence all n\ permutations of 
(91 ! 92 ! ■ - ■ • 9ti) ^I'c equivalent. A global state for a system with n routers 
may be represented as 11 1— 1 ' where is the number of routers in state 
Si ^ S and S^^'j^ fcj — n. Formally, Counting Equivalence states that two 
system states H^ — 1 s^^ and H^ — 1 s^^ are equivalent if fcj — li^i. 

^'^The notion of counting equivalence also applies to transitions and faults. 
Those transitions or faults leading to equivalent states arc considered equiv- 
alent. 
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is represented by G = ((/[^ , , . . . , gJJ^), where m — \S\, 
"Eri = 71. For symbolic representation of topologies where n 
is unknown € [0, 1, 2, 1 + , *] ('!+' is 1 or more, and is 
or more). 

To satisfy the correctness conditions for PIM-DM, the 
correct stable global states are those containing no for- 
warders and no routers expecting packets, or those con- 
taining one forwarder and one or more routers expecting 
packets from the link; symbolically this may be given by: 
Gi = {F'\ NH°, NC) , and G2 = {F\ NH'-+ , NC) . 

We use X to denote any state Si £ S. For example, {X — 
F}* denotes or more states Si £ S — {F}. This symbolic 
representation is used to estimate the size of the reduced 
state space. 

D.2.b Complexity of the state space with equiva- 
lence reduction. Considering counting equivalence, find- 
ing the number of equivalent states becomes a problem of 
combinatorics. The number of equivalent states becomes 
C{n + s~l,n) = ^",^f^lZi]'i 1 where, n is the number of routers, 
s is the number of state symbols, and C{x,y) — , is 

the number of y-combination of a;-set 

D.3 Representation of error and correct states 

Depending on the correctness definition we may get differ- 
ent counts for the number of correct or error states. To get an 
idea about the size of the correct or error state space for our 
case study, we take two definitions of correctness and com- 
pute the number of correct states. For the correct states of 
PIM-DM, we either have: (1) no forwarders with no routers 
expecting packets from the LAN, or (2) exactly one forwarder 
with routers expecting packets from the LAN ^ 

The correct space and the erroneous space must be disjoint 
and they must be complete (i.e. add up to the complete 
space), otherwise the specification is incorrect. See Appendix 
I- A for details. 

We present two correctness definitions that are used in our 
case. 

• The first definition considers the forwarder states as F and 
the routers expecting packets from the LAN as NH. Hence, 
the symbolic representation of the correct states becomes: 
{{X -NH- FY), or {NH, F, {X - F}'), 

^^For convenience, we may represent these two states as Gi — (^NC*), 
and G2 = (-F, NH'-+, JVC*) . 

^^These conditions we Irave found to be reasonably sufficient to meet the 
functional correctness requirements. However, they may not be necessary, 
hence the search may generate false errors. Proving necessity is part of future 
work. 



and the number of correct states is: C'{n + s — 3,n) + C{n + 
s - 4,n - 2). 

• The second definition considers the forwarder states as 
{Fi, Fi_oei} or simply Fx, and the states expecting packets 
from the LAN as {N Hi, N H^_mx} or simply NHx- Hence, 
the symbolic representation of the correct states becomes: 
{{X - NHx - Fx}*), or {NHx,Fx,{X - Fx}'), 
and the number of correct states is: 

C(n -I- s - 5, 71) + 4 ■ C(n -hs-5,n-2)-2 - C{n + s - 6, 71 - 3). 

Refer to Appendix I-B for more details on deriving the 
number of correct states. 

In general, we find that the size of the error state space, ac- 
cording to both definitions, constitutes the major portion of 
the whole state space. This means that search techniques 
explicitly exploring the error states are likely to be more 
complex than others. We take this in consideration when 
designing our methods. 

IV. Fault-independent Test Generation 

Fault-independent test generation (FITG) uses the forward 
search technique to investigate parts of the state space. As in 
reachability analysis, forward search starts from initial states 
and applies the stimuli repeatedly to produce the reachable 
state space (or part thereof). Conventionally, an exhaus- 
tive search is conducted to explore the state space. In the 
exhaustive approach all reachable states are expanded until 
the reachable state space is exhausted. We use several man- 
ifestations of the notion of counting equivalence introduced 
earlier to reduce the complexity of the exhaustive algorithm 
and expand only equivalent subspaces. To examine robust- 
ness of the protocol, we incorporate selective loss scenarios 
into the search. 

A. Reduction Using Equivalences 

The search procedure starts from the initial states |^ and 
keeps a list of states visited to prevent looping. Each state 
is expanded by applying the stimuli and advancing the state 
machine forward by implementing the transition rules and 
returning a new stable state each time |^. We use the count- 
ing equivalence notion to reduce the complexity of the search 
in three stages of the search: 

1. The first reduction we use is to investigate only the equiv- 
alent initial states. To achieve this we simply treat the 
set of states constituting the global state as unordered set 

"^■^For our case study the routers start as either a non-member (NAI) or 
empty upstream routers (EU), that is, the initial states I.S. — {NAI, EU}. 
^^For details of the above procedures, see Appendix II-A. 



instead of ordered set. For example, the output of such 
procedure for I.S. — {NM, EU} and n — 2 would be: 
{NM, NM}, {NM, EU}, {EU, EU}. 

One procedure that produces such equivalent initial state 
space given in Appendix II-B. The complexity of the this 
algorithm is given by C(n + i.s. — 1, n) as was shown in Sec- 
and verified through simulation. 
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2. The second reduction we use is during comparison of vis- 
ited states. Instead of comparing the actual states, we com- 
pare and store equivalent states. Hence, for example, the 
states {NFi,NH2} and {NHt_,NF2} are equivalent. 

3. A third reduction is made based on the observation that 
applying identical stimuli to different routers in identical 
states leads to equivalent global states. Hence, we can elimi- 
nate some redundant transitions. For example, for the global 
state {N Hi, N H2, F2.} a Leave applied to R\ or R2 would 
produce the equivalent state {N , NC^ , F^}. To achieve 
this reduction we add fiag check before advancing the state 
machine forward. We call the algorithm after the third re- 
duction the reduced algorithm. 

In all the above algorithms, a forward step advances the 
GFSM to the next stable state. This is done by applying all 
the internally dependent stimuli (elicited due to the applied 
external stimulus) in addition to any timer implications, if 
any exists. Only stable states are checked for correctness. 

B. Applying the Method 

In this section we discuss how the fault-independent test 
generation can be applied to the model of PIM-DM. We ap- 
ply forward search techniques to study correctness of PIM- 
DM. We first study the complexity of the algorithms without 
faults. Then we apply selective message loss to study the pro- 
tocol behavior and analyze the protocol errors. 

B.l Method input 

The protocol model is provided by the designer or protocol 
specification, in terms of a transition table or transition rules 
of the GFSM, and a set of initial state symbols. The design 
requirements, in terms of correctness in this case, is assumed 
to be also given by the protocol specification. This includes 
definition of correct states or erroneous states, in addition 
to the fault model if studying robustness. Furthermore, the 
detection of equivalence classes needs to be provided by the 
designer |^. Currently, we do not automate the detection 
of equivalent classes. Also, the number of routers in the 

^■'For our case study, the symmetry inherent in multicast over LANs was 
used to establish the counting equivalence for states, transitions and faults. 
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Fig. 4 

Simulation statistics for forward algorithms. Expanded States is 

THE NUMBER OF STABLE STATES VISITED, Forwards IS THE NUMBER OF 
FORWARD ADVANCES OF THE STATE MACHINE, Transitions IS THE NUMBER 
OF TRANSIENT STATES VISITED AND Errors IS THE NUMBER OF STABLE 
STATE ERRORS DETECTED. 



topology or topologies to be investigated (i.e., on the LAN) 
has to be specified. 

B.2 Complexity of forward search for PIM-DM 

The procedures presented above were simulated for PIM- 
DM to study its correctness. This set of results shows behav- 
ior of the algorithms without including faults, i.e., when used 
for verification. We identified the initial state symbols to be 
{NM, EU}; NM for downstream routers and EU for up- 
stream routers. The number of reachable states visited, the 
number of transitions and the number of erroneous states 
found were recorded. Summary of the results is given in Fig- 
ure ^. The number of expanded states denotes the number of 
visited stable states. The number of 'forwards' is the number 
of times the state machine was advanced forward denoting 
the number of transitions between stable states. The num- 
ber of transitions is the number of visited transient states, 
and the number of error states is the number of stable (or 
expanded) states violating the correctness conditions. The 
error condition is given as in the second error condition in 



Section III-D.3. Note that each of the other error states is 
equivalent to at least one error state detected by the re- 
duced algorithm. Hence, having less number of discovered 
error states by an algorithm in this case does not mean losing 
any information or causes of error, which follows from the 
definition of equivalence. Reducing the error states means 
reducing the time needed to analyze the errors. 

We notice that there significant reduction in the algorithm 
complexity with the use of equivalence relations. In particu- 



lar, the number of transitions is reduced from 0(4") for the 
exhaustive algorithm, to 0(n'*) for the reduced algorithm. 
Similar results were obtained for the number of forwards, 
expanded states and number of error states. The reduc- 
tion gained by using the counting equivalence is exponential. 
More detailed presentation of the algorithmic details and re- 
sults are given in Appendix II. 

For robustness analysis (vs. verification), faults are in- 
cluded in the GFSM model. Intuitively, an increase in the 
overall complexity of the algorithms will be observed. Al- 
though we have only applied faults to study the behavior of 
the protocol and not the complexity of the search, we an- 
ticipate similar asymptotic reduction gains using counting 
equivalence. 

B.3 Summary of behavioral errors for PIM-DM 

We used the above algorithm to search the protocol model 
for PIM-DM. Correctness was checked automatically by the 
method by checking the stable states (i.e., after applying 
complete transitions). By analyzing the sequence of events 
leading to error we were able to reason about the protocol be- 
havior. Several PIM-DM errors were detected by the method, 
some pertaining to correctness in the absence of message loss, 
while others were only detected in the presence of message 
loss. We have studied cases of up to 14-router LANs. Some- 
times errors were found to occur in difi^erent topologies for 
similar reasons as will be shown. Here, we only discuss results 
for the two router and 3-router LAN cases for illustration. 

• Only one error was detected in the two-router case. With 
the initial state {EU,EU} (i.e., both routers are upstream 
routers), the system enters the error state {F,NF}, where 
there is a forwarder for the LAN but there are no routers 
expecting packets or attached members. In this case the 
Assert process chose one forwarder for the LAN, but there 
were no downstream routers to Prune off the extra traffic, 
and so the protocol causes wasted bandwidth. 

• Several errors were detected for the 3-router LAN case: 

— Starting from {EU, EU, EU} the system enters the error 
state {_F, NF, NF} for a similar reason to that given above. 

— Starting from {NM, EU, EU } the system enters the er- 
ror state {NC, NF,F}. By analyzing the trace of events 
leading to the error we notice that the downstream router 
NC pruned off one of the upstream routers, NF, before the 
Assert process takes place to choose a winner for the LAN. 
Hence the protocol causes wasted bandwidth. 

— Starting from {NM, EU, EU} the system enters state 
{NH,F,F}. This is due to the transition table rules, when 



a forwarder sends a packet, all upstream routers in the EU 
state transit into F state. This is not an actual error, how- 
ever, since the system will recover with the next forwarded 
packet using Assert 0. The detection of this false-error could 
have been avoided by issuing SPkt stimulus before the error 
check, to see if the system will recover with the next packet 
sent. 

— With message loss, errors were detected for Join and 
Prune loss. When the system is in {NH, NH, F} state and 
one of the downstream members leaves (i.e., issues L event), a 
Prune is sent on the LAN. If this Prune is selectively lost by 
the other downstream router, a Join will not be sent and the 
system enters state {NC, NH, NF}. Similarly, if the Join is 
lost, the protocol ends up in an error state. 

C. Challenges and Limitations 

In order to generalize the fault-independent test generation 
method, we need to address several open research issues and 
challenges. 

• The topology is an input to the method in terms of number 
of routers. To add topology synthesis to FITG we may use 
the symbolic representation presented in Section [II-D| , where 
the use of repetition constructs ^ may be used to represent 
the LAN topology in general. A similar principle was used 
in for cache coherence protocol verification, where the 
state space is split using repetition constructs based on the 
correctness definition. In Section ^we present a new method 
that synthesizes the topology automatically as part of the 
search process. 

• Equivalence classes are given as input to the method. In 
this study we have used symmetries inherent in multicast 
routing on LANs to utilize equivalence. This symmetry may 
not exist in other protocols or topologies, hence the forward 
search may become increasingly complex. Automating iden- 
tification of equivalence classes is part of future work. 
Other kinds of equivalence may be investigated to reduce 
complexity in these cases |^. Also, other techniques for 
complexity reduction may be investigated, such as statis- 
tical sampling based on randomization or hashing used in 

^^This is one case where the correctness conditions for tile modei arc suffi- 
cient but not necessary to meet ttie functional requirements for correctness, 
tlius leading to a false error. Sufficiency and necessity proofs are subject of 
future worli. 

^^Rcpetition constructs include, for example, the to represent zero or 
more states, or the '1+' to represent one or more states, '2-t~' two or more, 
so on. 

^^An example of another liind of equivalence is fault dominance, where a 
system is proven to necessarily reach one error before reaching another, thus 
the former error dominates the latter error. 
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SPIN However, sampling techniques do not achieve full 
coverage of the state space. 

• The topology used in this study is limited to a single-hop 
LAN. Although we found it quite useful to study multicast 
routing over LANs, the method needs to be extended to 
multi-hop LAN to be more general. Our work in jl^ intro- 
duces the notion of virtual LAN, and future work addresses 
multi-LAN topologies. 

In sum, the fault-independent test generation may be used 
for protocol verification given the symmetry inherent in the 
system studied (i.e., protocol and topology). For robustness 
studies, where the fault model is included in the search, the 



con plexity of the search grows. In this approach we did not 
address performance issues or topology synthesis. These is- 
sues are addressed in the coming sections. However, we shall 
re-use the notion of forward search and the use of counting 
equivalence in the method discussed next. 

V. Fault-oriented Test Generation 

In this section, we investigate the fault-oriented test gen- 
eration (FOTG), where the tests are generated for specific 
faults. In this method, the test generation algorithm starts 
from the fault(s) and searches for a possible error, establish- 
ing the necessary topology and events to produce the error. 
Once the error is established, a backward search technique 
produces a test sequence leading to the erroneous state, if 
such a state is reachable. We use the FSM formalism pre- 
to represent the protocol. We also re-use 



tion, and uses search techniques similar to those explained 
earlier in Section If an error occurs, backward search 
is performed thereafter to establish a valid sequence lead- 
ing from an initial state to the synthesized global state. To 
achieve this, the transition rules are reversed and a search is 
performed until an initial state is reached, or the synthesized 
state is declared unreachable. This process is called backward 
implication. 

Much of the algorithmic details are based on condition —> 
effect reasoning of the transition rules. This reasoning is 
emphasized in the semantics of the transition table used in 
the topology synthesis and the backward search. Section ^ 

we describe the 



A.l describes these semantics. In Section V-B 



sented in Section HI 



some ideas from the FITG algorithm previously presented, 
such as forward search and the notion of equivalence for 
search reduction. 

A. FOTG Method Overview 

Fault-oriented test generation (FOTG) targets specific 
faults or conditions, and so is better suited to study ro- 
bustness in the presence of faults in general. FOTG has 
three main stages: a) topology synthesis, b) forward im- 
plication and error detection, and c) backward implication. 
The topology synthesis establishes the necessary components 
(e.g., routers and hosts) of the system to trigger the given 
condition (e.g., trigger a protocol message). This leads to 
the formation of a global state in the middle of the state 
space Forward search is then performed from that global 
state in its vicinity, i.e., within a complete transition, after 
applying the fault. This process is called forward implica- 

^'^The global state from which FOTG starts is synthesized for a given fault, 
such as a message to be lost. 



algorithmic details of FOTG, and in Section V-C we describe 
how FOTG was applies to PIM-DM in our case study, and 



present the results and method evaluation. Section V-D we 
discuss the limitations of the method and our findings. 

A.l The Transition Table 

The global state transition may be represented in sev- 
eral ways. Here, we choose a transition table representation 
that emphasizes the effect of the stimuli on the system, and 
hence facilitates topology synthesis. The transition table de- 
scribes, for each stimulus, the conditions of its occurrence. 
A condition is given as stimulus and state or transition (de- 
noted by stimulus. state/trans), where the transition is given 
as startState — > endState. 

We further extend message and router semantics to cap- 
ture multicast semantics. Following, we present a detailed 
description of the semantics of the transition table then give 
the resulting transition table for our case study, to be used 
later in this section. 

A.l. a Semantics of the transition table. In this subsec- 
tion we describe the message and router semantics, pre- 
conditions, and post-conditions. 

• Stimuli and router semantics: Stimuli are classified based 
on the routers affected by them. Stimuli types include: 

1. orig: stimuli or events occurring within the router orig- 
inating the stimulus but do not affect other routers, and in- 
clude H,J, L, SPkt, GraftTx, Del and Rtx. 

2. dst: messages that are processed by the destination 
router only, and include Join, GAck and Graftjtcv 

3. racast: multicast messages that are processed by all 
other routers, and include Assert and FPkt. 

4. mcastDownstream: multicast messages that are pro- 
cessed by all other downstream routers, but only one up- 
stream router, and includes the Prune message. 
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These types are used by the search algorithm for processing 
the stimuh and messages. According to these different types 
of stimuh processing a router may take as subscript ^orig', 
^dst', or 'other'. The 'orig' symbol designates the originating 
router of the stimulus or message, whereas 'dst' designates 
the destination of the message, 'other' indicates routers other 
than the originator. Routers are also classified as upstream 
or downstream as presented in Section [IL 

• Pre-Conditions: The pre-conditions in general are of 
the form stimulus .state/transition, where the transition is 
given as startState — > endState. If there are several pre- 
conditions, then we can use a logical OR to represent the 
rule. At least one pre-condition is necessary to trigger the 
stimulus. Example of a stimulus. state condition is the con- 
dition for Join message, namely, PrunCother-N Horig, that is, 
a Join is triggered by the reception of a Prune from another 
router, with the originator of the Join in NH. An example 
of a stimulus. transition condition is the condition for Graft 
transmission HJ.{NC —> NH); i.e. a host joining and the 
transition of the router from the negative cache state to the 
next hop state. 

• Post-Conditions: A post-condition is an event and/or 
transition that is triggered by the stimulus. |^ Post- 
conditions may be in the form of: (1) transition, 
(2) condition.transition, (3) condition. stimulus, and (4) 
stimulus. transition. 

1. transition: has an implicit condition with which it is 
associated; i.e. 'a — > b' means 'if a £ GState then a b'. 
For example. Join post-condition (NFdst ^ -Fdat), means if 
NFdst £ GState then transition NF —> F will occur. 

2. Condition.transition: is same as (1) except the condi- 
tion is explicit ^ 

3. Condition. stimulus: if the condition is satisfied then the 
stimulus is triggered. For example. Prune post-condition 
'NHother.Joinother', meaus that for all NFlx € C State 
(where x is not equal to orig) then have router x trigger 
a Join. 

4. Stimulus. transition: has the transition condition im- 
plied as in (1) above. For example, Graftucv post-condition 
'GAck.{NFdst ~* Fdst)\ means if NFdst G GState, then the 
transition occurs and GAck is triggered. 

If more than one post-condition exists, then the logical re- 
lation between them is either an 'XOR' if the router is the 

^^Network faults, such as message loss, may cause the stimulus not to 
take effect. For example, losing a Join message will cause the event of Join 
reception not to take effect. 

^"'^This does not appear in our case study. 



same, or an 'AND' if the routers are different. For example. 
Join post-conditions are 'Fdst.Dei —* Fdst, NFdst —* Fdst', 
which means {Fdst.Dei -* Fdst) XOR {NFdst ^ Fdst)- 
On the other hand. Prune post-conditions are 'Fdst —> 
Fdst.Dci, NHother-Joinother', which implies that the transi- 
tion will occur if Fdst G GState AND a Join will be triggered 
ff NH € GState. 

Following is the transition table used in our case study. 



Stimulus 


Pre-conditions 


Post-conditions 


Join 


P^^^'^otheT■N^or^a 


^dst_Del ^dst' ^^dst -* ^dst 


Prxme 


L.NC, FPkt.NC 


^dst ^dst_Del ' 
^ Mother other- 


Grafts,: 


HJ.(NC — ' NH), 
RtxExp. (NH — * NH) 


Graft ji^^.iNH ^ NH_^tx) 


Graftji^^ 


Graft^^.iNH iV/f__R£^) 


GAek.{NF^^t ^ Frf^i) 


GAck 


Graft^^^.F 


^^dst^Rtx - N^dst 


Assert 


^f^i other ■P'or-ig 


Fother - W-Pot/xer 


FPkt 


Spkt.F 


Prune. {N M NC), 

ED NH, M ^ NH, 

^^other - ^oth^r' ^other ■ ^^^^ 


Rtx 


RtxExp 


GraftT:^.{NH^^ig_^^^ - NH^^^g) 


Del 


DclExp 


Por^g.Del - ^^orig 


SPkt 


Ext 


FPkt-( EU^^ig F^^^g) 


HJoin 


Ext 


NM M, Graftj^^.{NC NH) 


Leave 


Ext 


M N M, Prune. (N H NC), 
Prune. (NHji^^ ^ NC) 



The above pre-conditions can be derived automatically 
from the post-conditions. In Appendix III, we describe the 
'Preconditions' procedure that takes as input one form of the 
conventional post-condition transition table and produces the 
pre-condition semantics. 

A.l.b State Dependency Table. To aid in test sequence 
synthesis through the backward implication procedure, we 
construct what we call a state dependency table. This table 
can be inferred automatically from the transition table. We 
use this table to improve the performance of the algorithm 
and for illustration. 

For each state, the dependency table contains the possible 
preceding states and the stimulus from which the state can be 
reached or implied. To obtain this information for a state s, 
the algorithm the post-conditions of the transition table for 
entries where the endState of a transition is s. In addition, 
a state may be identified as an initial state (I.S.), and hence 
can be readily established without any preceding states. The 
'dependency Table' procedure in Appendix III generates the 
dependency table from the transition table of conditions. For 
s £ I.S. a symbol denoting initial state is added to the array 
entry. For our case study I.S. = {N M, EU}. Based on 

'^'^There is an implicit condition that can never be satisfied in both state- 
ments, which is the existence of dst in only one state at a time. 
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the above transition table, following is the resulting state 
dependency table: ^ 



state 


Possible Backward Implication(s) 


Fi 


FPkt f, Jain Graftj, 

^JUl* EUi 






NFi 


F,_^^„^t^^* F, 


NHi 


(■^ NHi_fu^ , < iVCi , < Mi , • EDi 




Graftrj, 

^ NHi 


NCi 


^Jlh* NMi, NHi_ru:c, -'V-f/i 


EUi 


^ /.s. 


BDi 


<- I.S. 


Mi 


£^ iVAf, 


NMi 


t-!^ Mi, ^ I.S. 



In cases where the stimulus affects more than one router 
(e.g., multicast Prune), multiple states need to be simul- 
taneously implied in one backward step, otherwise an I.S. 
may not be reached. To do this, the transitions in the post- 
conditions of the stimulus are traversed, and any states in 
the global state that are endStates are replaced by their cor- 
responding startStates. For example, {Mi, NMj , Fk} 
{NHi, NCj,Fk}. This is taken care of by the backward im- 
plication section described later. 

B. FOTG details 

As previously mentioned, our FOTG approach consists of 
three phases: I) synthesis of the global state to inspect, II) 
forward implication, and III) backward implication. These 
phases are explained in more detail in this section. In Sec- 



tion V-C we present an illustrative example for the these 



phases. 

B.l Synthesizing the Global State 

Starting from a condition (e.g., protocol message or stimu- 
lus), and using the information in the protocol model (i.e. 
the transition table), a global state is synthesized for inves- 
tigation. We refer to this state as the global-state inspected 
(G/), and it is obtained as follows: 

1. The global state is initially empty and the inspected stim- 
ulus is initially set to the stimulus investigated. 

2. For the inspected stimulus, the state(s) (or the 
startState{s) of the transition) of the post-condition are ob- 
tained from the transition table. If these states do not exist 
in the global state, and cannot be inferred therefrom, then 
they are added to the global state. 

^■^Thc possible backward implications arc separated by 'commas' indicat- 
ing 'OR' relation. 



3. For the inspected stimulus, the state(s) (or the 
endStateis) of the transition) of the pre-condition are ob- 
tained. If these states do not exist in the global state, and 
cannot be inferred therefrom, then they are added to the 
global state. 

4. Get the stimulus of the pre-condition of the inspected 
stimulus, call it new Stimulus. If new Stimulus is not 
external [Ext), then set the inspected stimulus to the 
newStimulus, and go back to step 2. 

The second step considers post-conditions and adds system 
components that will be affected by the stimulus. While the 
third and forth steps synthesize the components necessary to 
trigger the stimulus. The procedure given in Appendix III 
synthesizes minimum topologies necessary to trigger a given 
stimulus of the protocol. 

Note that there may be several pre-conditions or post- 
conditions for a stimulus, in which case several choices can be 
made. These represent branching points in the search space. 
At the end of this stage, the global state to be investigated 
is obtained. 

B.2 Forward Implication 

The states following Gi (i.e. Gi+i where i > 0) are obtained 
through forward implication. We simply apply the transi- 
tions, starting from Gi, as given by the transition table, in 
addition to implied transitions (such as timer implication). 
Furthermore, faults are incorporated into the search. For 
example, in the case of a message loss, the transition that 
would have resulted from the message is not applied. If more 
than one state is affected by the message, then the space is 
expanded to include the various selective loss scenarios for 
the affected routers. For crashes, the routers affected by the 
crash transit into the crashed state as defined by the ex- 



panded transition rules, as will be shown in Section V-C. 
Forward implication uses the forward search techniques de- 
scribed earlier in Section 

According to the transition completion concept (see Sec- 
tion III-C.2), the proper analysis of behavior should start 
from externally triggered transitions. For example, the anal- 
ysis should not consider a Join without considering the 
Prune triggering it and its effects on the system. Thus, the 
global system state must be rolled back to the beginning of 
a complete transition (i.e. the previous stable state) before 
applying the forward implication. This will be implied in the 
forward implication algorithm to simplify the discussion. 
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B.3 Backward Implication 

Backward implication attempts to obtain a sequence of 
events leading to Gj, from an initial state (I.S.), if such a 
sequence exists; i.e. if Gi is reachable from I.S. 



The state dependency table described in Section V-A.l.b 
is used in the backward search. 

Backward steps are taken for the components in the global 
state Gi, each step producing another global state G State. 
For each state in GState possible backward implication rules 
are attempted to obtain valid backward steps toward an ini- 
tial state. This process is repeated for preceding states in a 
depth first fashion. A set of visited states is maintained to 
avoid looping. If all backward branches are exhausted and 
no initial state was reached the state is declared unreachable. 

To rewind the global state one step backward, the re- 
verse transition rules are applied. Depending on the stim- 
ulus type of the backward rule, different states in GState 
are rolled back. For orig and dst only the originator and 
destination of the stimulus is rolled back, respectively. For 
mcast, all affected states are rolled back except the origina- 
tor. mcastDownstream is similar to mcast except that all 
downstream routers or states are rolled back, while only one 
upstream router (the destination) is rolled back. Appendix 
III shows procedures 'Backward' and 'Rewind' that imple- 
ment the above steps. 

Note, however, that not all backward steps are valid, and 
backtracking is performed when a backward step is invalid. 
Backtracking may occur when the preceding states contradict 
the rules of the protocol. These contradictions may manifest 
themselves as: 

• Src not found: src is the originator of the stimulus, and the 
global state has to include at least one component to originate 
the stimulus. An example of this contradiction occurs for the 
Prune stimulus, for a global state {N H, F, NF}, where the 
an originating component of the Prune {NC in this case) 
does not belong to the global state. 

• Failure of minimum topology check: the necessary con- 
ditions to trigger the stimulus must be present in the 
global topology. Examples of failing the minimum topol- 
ogy check include, for instance. Join stimulus with global 
state {NH,NF}, or Assert stimulus with global state 
{F,NH, NC}. 

• Failure of consistency check: to maintain consistency of 
the transition rules in the reverse direction, we must check 
that every backward step has an equivalent forward step. To 
achieve this, we must check that there is no transition x —i- y 



for the given stimulus, such that x G GState. Since if x 
remains in the preceding global state, the corresponding for- 
ward step would transform x into y and the system would ex- 
ist in a state inconsistent with the initial global state (before 
the backward step). An example of this inconsistency ex- 
ists when the stimulus is FPkt and GState = {F, NF, EU}, 
where EU — > _F is a post condition for FPkt. See Appendix 
III for the consistency check procedure. 

C. Applying The Method 

In this section we discuss how the fault-oriented test gen- 
eration can be applied to the model of PIM-DM. Specifi- 
cally, we discuss in details the application of FOTG to the 
robustness analysis of PIM-DM in the presence of single mes- 
sage loss and machine crashes. We first walk through a sim- 
ple illustrative example. Then we present the results of the 
case study in terms of correctness violations captured by the 
method. 

C.l Method input 

The protocol model is provided by the designer or proto- 
col specification, in terms of a transition table ^ and the 
semantics of the messages. In addition, a list of faults to be 
studied is given as input to the method. For example, def- 
inition of the fault as single selective protocol message loss, 
applied to the list of messages {Join, Prune, Assert, Graft}. 
Also a set of initial state symbols, in our case {NM, EU}. A 
definition of the design requirement, in this case definition of 
correctness, is also provided by the specification. The rest of 
the process is automated. 

C.2 Illustrative example 

Figure ^ shows the phases of FOTG for a simple example of 
a Join loss. Following are the steps taken for that example: 

Synthesizing tine Global State 

1. Join: startState of post-condition is NF^^^ ^ Gj = { « } 



state of pri 



Gj = {N Hi, NF^.} , goto Pr 



rtState of post-c 



nplicd fro 



4. Prune: state of pre-condition is NCj =i- G j = { N H ^ , N Ff^ . N C j } , goto 
(Ext) 

5. StartState of post-condition is NH can be implied from NC in Gj 



Forward innplicatii 



without loss: Gj = { N , N F^ . N G j } '^SIX^ Gj^i = {N H^. F^. N G j } 
loss w.r.t. Ry. { N H ^ , N F^. , N G j } > G j = { N H ^ , N F^ , N G j } crro 



The traditional input/output transition tabic is sufficient for our 
method. The prc/post-condition transition table ean be derived automati- 
cally therefrom. 



Fig. 6 

Graft event sequencing 



fixes for those errors. 

Join: A scenario similar to that presented in Section ^ 
C.2 incurred an error. In this case, the robustness violation 
was not allowing another chance to the downstream router 
to send a Join. A suggested fix would be to send another 
prune by Fnei before the timer expires. 

Prune: In the topology above, an error occurs when Ri 
loses the Prune, hence no Join is triggered. The fix sug- 
gested above takes care of this case too. 

Assert: An error in the Assert case occurs with no down- 
stream routers; e.g. Gj = {Fi,Fj}. The design error is the 
absence of a mechanism to prevent pruning packets in this 
case. One suggested fix would be to have the Assert winner 
schedule a deletion timer (i.e. becomes Foei) and have the 
downstream receiver (if any) send Join to the Assert winner. 

Graft: A Gra ft message is acknowledged by GAck, hence 
the protocol did not incur error when the Graft message 
was lost with non-interleaved external events. The protocol 
is robust to Graft loss with the use of Rtx timer. Adversary 
external conditions are interleaved during the transient states 
and the Rtx timer is cleared, such that the adverse event will 
not be overridden by the Rtx mechanism. 

To clear the Rtx timer, a transition should be created from 
NHutx to NH which is triggered by a GAck according to the 

GAck 

state dependency table {NH < — NHmx)- This transition 
is then inserted in the event sequence, and forward and back- 
ward implications are used to obtain the overall sequence of 
events illustrated in Figure ^. In the first and second sce- 
narios (I and II) no error occurs. In the third scenario (III) 
when a Graft followed by a Prune is interleaved with the 
Graft loss, the Rtx timer is reset with the receipt of the 
GAck for the first Graft, and the systems ends up in an 



Fig. 8 

Crash leading to black holes 



D. Challenges and Limitations 

Although we have been able to apply FOTG to PIM-DM 
successfully, a discussion of the open issues and challenges is 
called for. In this section we address some of these issues. 

• The topologies synthesized by the above FOTG study are 
only limited to a single-hop LAN with n routers Q This 
means that the above FOTG analysis is necessary but not 
sufficient to verify robustness of the end-to-end behavior of 
the protocol in a multi-hop topology; even if each LAN in the 
topology operates correctly, the inter-LAN interaction may 
introduce erroneous behaviors. Applying FOTG to multi-hop 
topologies is part of future research. 

• The analysis for our case studies did not consider network 
delays. In order to study end-to-end protocols network delays 
must be considered in the model. In we introduce the 
notion of virtual LAN to include end-to-end delay semantics. 

• Minimal topologies that are necessary and sufficient to trig- 
ger the stimuli, may not be sufficient to capture all correct- 
ness violations. For example, in some cases it may require 
one member to trigger a Join, but two members to expe- 
rience an error caused by Join loss. Hence, the topology 
synthesis stage must be complete in order to capture all pos- 
sible errors. To achieve this we propose to use the symbolic 
representation. For example, to cover all topologies with one 
or more members we use (M^"*"). Integration of this notation 
with the full method is part of future work. 

• The efficiency of the backward search may be increased us- 
ing reduction techniques, such as equivalence of states and 
transitions (similar to the ones presented in Section IV). In 
addition, the algorithm complexity may be reduced by utiliz- 
ing information about reachable states to reduce the search. 

^^This limitation is similar to that suffered by FITG in Section |iv|. 
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This information could be obtained simply by storing pre- 
vious sequences and states visited. Alternatively, the de- 
signer may provide information -based on protocol-specific 
knowledge- about reachable states, through a compact rep- 
resentation thereof. 

• The topologies constructed by FOTG are inferred from the 
mechanisms specified by the transition table of the GFSM. 
The FOTG algorithm will not construct topologies resulting 
from non-specified mechanisms. For example, if the Assert 
mechanism that deals with duplicates was left out (due to 
a design error) the algorithm would not construct {Fi , Fj } 
topology. Hence, FOTG is not guaranteed to detect dupli- 
cates in this case. So, FOTG (as presented here) may be used 
to evaluate behavior of specified mechanisms in the presence 
of network failures, but is not a general protocol verification 
tool. 

• The global states synthesized during the topology synthe- 
sis phase are not guaranteed to be reachable from an ini- 
tial state. Hence the algorithm may be investigating non- 
reachable states, until they are detected as unreachable in 
the last backward search phase. Adding reachability detec- 
tion in the early stages of FOTG is subject of future work. 
However, statistics collected in our case study (see Appendix 
HI-F) show that unreachable states are not the determining 
factor in the complexity of the backward search. Hence, other 
reduction techniques may be needed to increase the efficiency 
of the method. 

We believe that the strength of our fault-oriented method, 
as was demonstrated, lies in its ability to construct the nec- 
essary conditions for erroneous behavior by starting directly 
from the fault and avoiding the exhaustive walk of the state 
space. Also, converting timing problems into sequencing 
problems (as was shown for Graft analysis) reduces the com- 
plexity required to study timers. FOTG as presented in this 
chapter seems best fit to study protocol robustness in the 
presence of faults. Faults presented in our studies include 
single selective loss of protocol messages and router crashes. 

VI. Related Work 

The related work falls mainly in the field of protocol veri- 
fication, distributed algorithms and conformance testing. In 
addition, some concepts of our work were inspired by VLSI 
chip testing. Most of the literature on multicast protocol 
design addresses architecture, specification, and comparisons 
between different protocols. We are not aware of any other 
work to develop systematic methods for test generation for 
multicast protocols. 



There is a large body of literature dealing with verifica- 
tion of communication protocols. Protocol verification is the 
problem of ensuring the logical consistency of the protocol 
specification, independent of any particular implementation. 
Protocol verification typically addresses well-defined proper- 
ties, such as safety (e.g., freedom from deadlocks) and live- 
ness (e.g., absence of non-progress cycles) jl6|. In general, 
the two main approaches for protocol verification are theorem 
proving and reachability analysis (or model checking) [Q. 
In theorem proving, system properties are expressed in logic 
formulas, defining a set of axioms and constructing relations 
on these axioms. In contrast to reachability analysis, theo- 
rem proving can deal with infinite state spaces. Interactive 
theorem provers require human intervention, and hence are 
slow and error-prone. Theorem proving includes model-based 
and logic-based formalisms. Model-based formalisms (e.g., 
Z VDM [Q) are suitable for protocol specifications in a 
succinct manner, but lack the tool support for effective proof 
of properties. The use of first order logic allows the use of 
theorem provers (e.g., Nqthm [^), but may result in spec- 
ifications that are difficult to read. Higher order logic (e.g., 
PVS j20|) provides expressive power for clear descriptions 
and proof capabilities for protocol properties. The number 
of axioms and relations grows with the complexity of the pro- 
tocol. Axiomatization and proofs depend largely on human 
intelligence, which limits the use of theorem proving systems. 
Moreover, these systems tend to abstract out network failures 
we are addressing in this study. 

Reachability analysis algorithms attempt to gen- 

erate and inspect all the protocol states that are reachable 
from given initial states. The main types of reachability 
analysis algorithms include full search and controlled partial 
search. If full search exceeds the memory or time limits, it 
effectively reduces to an uncontrolled partial search, and the 
quality of the analysis deteriorates quickly. Such algorithm 
suffers from the 'state space explosion' problem, especially 
for complex protocols. To circumvent this problem, state 
reduction and controlled partial search techniques 
could be used. These techniques focus only on parts of the 
state space and may use probabilistic [^, random |^ or 
guided searches j26|. In our work we adopt approaches ex- 
tending reachability analysis for multicast protocols. Our 
fault-independent test generation method (in Section ^ ) 
borrows from controlled partial search and state reduction 
techniques. 

Work on distributed algorithms deals with synchronous 
networks, asynchronous shared memory and asynchronous 
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networked systems Proofs can be established using 

an automata-theoretic framework. Several studies on dis- 
tributed algorithms considered failure models including mes- 
sage loss or duplication, and processor failures, such as 
stop (or crash) failures, transient failures, or byzantine fail- 
ures [^, where failed processors behave arbitrarily. We do 
not consider byzantine failures in our study. Distributed 
algorithms may be treated in a formal framework, using 
automata-theoretic models and state machines, where re- 
sults are presented in terms of set-theoretic mathematics p^ . 
The formal framework is used to present proofs or impos- 
sibility results. Proof methods for distributed algorithms 
include invariant assertions and simulation relationships Q 
that are generally proved using induction, and may be check- 
able using theorem-provers, e.g., Larch theorem-prover ps[ ]. 
Asynchronous network components can be modeled as timed- 
automata 0, 0. 

Several attempts to apply formal verification to network 
protocols have been made. Assertional proof techniques were 
used to prove distance vector routing [^, path vector rout- 
ing and route diffusion algorithms |Q , and using 
communicating finite state machines. An example point-to- 
point mobile application was proved using assertional rea- 
soning in ^] using UNITY Axiomatic reasoning was 
used in proving a simple transmission protocol in [Q. Al- 
gebraic systems based on the calculus of communicating sys- 
tems (CCS) ||] have been used to prove CSMA/CD 
Formal verification has been applied to TCP and T/TCP 
in @]. 

Multicast protocols may be modeled as asynchronous net- 
works, with the components as timed-automata, including 
failure models. In fact, the global finite state machine 
(GFSM) model used by our search algorithms is adopted from 
asynchronous shared memory systems (in specific, cache co- 
herence algorithms |l^) and extended with various multicast 
and timing semantics. The transitions of the I/O automaton 
may be given in the form of pre-conditions and effects |^. 

The combination of timed automata, invariants, simulation 
mappings, automaton composition, and temporal logic 
seem to be very useful tools for proving (or disproving) and 
reasoning about safety or liveness properties of distributed al- 
gorithms. It may also be used to establish asymptotic bounds 

"^"^An invariant assertion is a property tlrat Irolds true for all reachable 
states of the system, while a simulation is a formal relation between an 
abstraet solution of the problem and a detailed solution. 

■^^This is similar to our representation of the transition table for the fault- 
oriented test generation method. 



on the complexity of the distributed algorithms. It is not 
clear, however, how theorem proving techniques can be used 
in test synthesis to construct event sequences and topolo- 
gies that stress network protocols. Parts of our work draw 
from distributed algorithms verification principles. Yet we 
feel that our work complements such work, as we focus on 
test synthesis problems. 

Conformance Testing is used to check that the external 
behavior of a given implementation of a protocol is equiv- 
alent to its formal specification. A conformance test fails 
if the implementation and specification differ. By contrast, 
verification of the protocol must always reveal the design er- 
ror. Given an implementation under test (lUT), sequences of 
input messages are provided and the resulting output is ob- 
served. The test passes only if all observed outputs matche 
those of the formal specification. The sequences of input mes- 
sages is called a conformance test suite and the main problem 
is to find an efficient procedure for generating a conformance 
test suite for a given protocol. One possible solution is to 
generate a sequence of state transitions that passes through 
every state and every transition at least once; also known 
as a transition tour ^sj. The state of the machine must 
be checked after each transition with the help of unique in- 
put/output (UIO) sequences To be able to verify every 
state in the lUT, we must be able to derive a UIO sequence 
for every state separately. This approach generally suffers 
from the following drawbacks. Not all states of an FSM have 
a UIO sequence. Even if all states in a FSM have a UIO 
sequence, the problem of deriving UIO sequences has been 
proved to be p-complete in |Q; i.e. only very short UIO 
sequences can be found in practice Q UIO sequences can 
identify states reliably only in a correct lUT. Their behavior 
for faulty lUTs is unpredictable, and they cannot guarantee 
that any type of fault in an lUT remains detectable. Only the 
presence of desirable behavior can be tested by conformance 
testing, not the absence of undesirable behavior. 

Conformance testing techniques are important for testing 
protocol implementations. However, it does not target design 
errors or protocol performance. We consider work in this area 
as complementary to the focus of our study. 

VLSI Chip testing uses a set of well-established approaches 
to generate test vector patterns, generally for detecting phys- 
ical defects in the VLSI fabrication process. Common test 

'^^A Unique Input/Output (UIO) sequence is a sequence of transitions that 
can be used to determine the state of the lUT. 

*^^In [ [45! a randomized polynomial time algorithm is presented for design- 
ing UIO checking sequences. 
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vector generation methods detect single-stuck faults; where 
the value of a line in the circuit is always at logic '1' or '0'. 
Test vectors are generated based on a model of the circuit 
and a given fault model. Test vector generation can be fault- 
independent or fault-oriented [^. In the fault-oriented 
process, the two fundamental steps in generating a test vec- 
tor are to activate (or excite) the fault, and to propagate the 
resulting error to an observable output. Fault excitation and 
error propagation usually involve a search procedure with 
a backtracking strategy to resolve or undo contradiction in 
the assignment of line and input values. The line assign- 
ments performed sometimes determine or imply other line 
assignments. The process of computing the line values to 
be consistent with previously determined values is referred 
to as implication. Forward implication is implying values of 
lines from the fault toward the output, while backward im- 
plication is implying values of lines from the fault toward the 
circuit input. Our approaches for protocol testing use some 
of the above principles; such as forward and backward im- 
phcation. VLSI chip testing, however, is performed a given 
circuit, whereas protocol testing is performed for arbitrary 
and time varying topologies. 

Other related work includes verification of cache coherence 
protocols . This study uses counting equivalence relations 
and symbolic representation of states to reduce space search 
complexity. We use the notion of counting equivalence in our 
study. 

VII. Conclusions 

In this study we have proposed the STRESS framework 
to integrate test generation into the protocol design process. 
Specifically, we targeted automatic test generation for robust- 
ness studies of multicast routing protocols. We have adopted 
a global FSM model to represent the multicast protocols on 
a LAN. In addition, we have used a fault model to represent 
packet loss and machine crashes. We have investigated two 
algorithms for test generation; namely, the fault-independent 
test generation (FITG) and the fault-oriented test genera- 
tion (FOTG). Both algorithms were used to study a stan- 
dard multicast routing protocol, PIM-DM, and were com- 
pared in terms of error coverage and algorithmic complexity. 
For FITG, equivalence reduction techniques were combined 
with forward search to reduce search complexity from ex- 
ponential to polynomial. FITG does not provide topology 
synthesis. For FOTG, a mix of forward and backward search 
techniques allowed for automatic synthesis of the topology. 
We believe that FOTG is a better fit for robustness studies 



since it targets faults directly. The complexity for FOTG 
was quite manageable for our case study. Corrections to er- 
rors captured in the study were proposed with the aid of our 
method and integrated into the latest PIM-DM specification. 
More case studies are needed to show more general applica- 
bility of our methodology. 

Appendix 

I. State Space Complexity 

In this appendix we present analysis for the state space 
complexity of our target system. In specific we present com- 
pleteness proof of the state space and the formulae to com- 
pute the size of the correct state space. 

A. State Space Completeness 

We define the space of all states as X* , denoting zero or 
more routers in any state. We also define the algebraic oper- 
ators for the space, where 

X* = X° U U X'^+ (1) 

(F",x*) = (y"+,{x-y}*) (2) 

A.l Error states 

In general, an error may manifest itself as packet dupli- 
cates, packet loss, or wasted bandwidth. This is mapped 
onto the state of the global FSM as follows: 

1. The existence of two or more forwarders on the LAN with 
one or more routers expecting packet from the LAN (e.g., in 
the NHx state) indicates duplicate delivery of packets. 

2. The existence of one or more routers expecting packets 
from the LAN with no forwarders on the LAN indicates a 
deficiency in packet delivery (join latency or black holes). 

3. The existence of one or more forwarders for the LAN with 
no routers expecting packets from the LAN indicates wasted 
bandwidth (leave latency or extra overhead). 

- for duplicates: one or more NHx with two or more Fx; 

{NHx,Fl+,X*) (3) 

- for extra bandwidth: one or more Fx with zero NHx', 

{Fx, {X -NHx}*) (4) 

- for blackholes or packet loss: one or more NHx with zero 
Fx; 

{NHx,{X -FxY) (5) 
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A. 2 Correct states 



B.2 Second case definition 



As described earlier, the correct states can be described by 
the following rule: 

3 exactly one forwarder for the LAN iff 3 one or more 
routers expecting packets from the LAN. 

- zero NHx with zero Fx', 



{{X - NHx - FxY) 
- one or more NLLx with exactly one Fx; 

{NHx, Fx, {X -Fx}*) 
from (B.2) and (B.3) we get: 

{NHx,F^+,{X -Fx}*) 



(6) 



(7) 



(8) 



if we take the union of (B.8), (B.5) and (B.7), and apply 
(B.l) we get: 

{NHx,X*) = {NHj+,{X-NHx}*) (9) 
also, from (B.4) and (B.2) we get: 

{F'x+,{X~NHx-Fx}*) (10) 
if we take the union of (B.IO) and (B.6) we get: 



(11) 



(12) 



{Fi,{X - NHx - Fx}*) = {{X - NHx}* 
taking the union of (B.9) and (B.ll) we get: 

{NH*x,{X-NHx}*) = {X*) 
which is the complete state space. 

B. Number of Correct and Error State Spaces 

B.l First case definition 

For the correct states: {{X — NH — F}*) reduces the sym- 
bols from which to choose the state by 2; i.e. yields the 
formula: 

C(n + (s -2)-l,n)=C{n + s- 3, n). 

While {NH, F, {X — F}*) reduces the number of routers 
to choose by 2 and the number of symbols by 1, yielding: 

C((n - 2) + (s - 1) - 1, n - 2) = C(n + s - 4, n - 2). 



For the correct states: {{X — NHx — Fx}*) reduces, the 
number of states by 4, yielding 

C(n + (s - 4) - 1, n) = C{n + s - 5,n). 

While {NHx,Fx,{X - Fx}*) reduces the number of 
routers to n — 2 and the symbols to s — 2 and yields 

4 • C{{n - 2) + (s - 2) - 1, n - 2)) = 4 • C{n + s - 5, n - 2). 

We have to be careful here about overlap of sets of correct 
states. For example {NH, F, {X — Fx}*) is equivalent to 
{N Hmx, F, {X — Fx}*) when a third router is in NHrix in 
the first set and NH in the second set. Thus we need to 
remove one of the sets {NH, F, NHmx, {X — Fx}*), which 
translates in terms of number of states to 

C((n - 3) + (s - 2) - 1, n - 3) = C(n + s - 6, n - 3). 

A similar argument is given when we replace F above by 
Foei, thus we multiply the number of states to be removed 
by 2. Thus, we get the total number of equivalent correct 
states: 

C{n + s-5,n) + 4-C{n + s~5,n-2)-2-C{n + s-6,n~3). 
To obtain the ErrorStates we can use: 

ErrorStates = TotalStates — CorrectStates. 
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Fig. 9 

The percentage of the correct and error states 



Figure ^ shows the percentage of each of the correct and 
error state spaces, and how this percentage changes with the 
number of routers. The figure is shown for the second case 
error definition. Similar results were obtained for the first 
case definition. 
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II. Forward Search Algorithms 

This appendix includes detailed procedures that imple- 
ment the forward search method as described in Section 
It also includes detailed statistics collected for the case study 
on PIM-DM. 

A. Exhaustive Search 

The ExpandSpace procedure given below implements an 
exhaustive search, where W is the working set of states to 
be expanded, V is the set of visited states (i.e. already ex- 
panded), and E is the state currently being explored. Ini- 
tially, all the state sets are empty. The nextState function 
gets and removes the next state from W, according to the 
search strategy; if depth first then W is treated as a stack, 
or as a queue if breadth first. 

Each state is expanded by applying the stimuli via the 
'forward' procedure that implements the transition rules and 
returns the new stable state New. 

ExpandSpace(iTiitGState){ 
add initGState to W 
while W not empty { 

E = ncxtGStatc from W; 
add E to V; 
V state e E 

V stim applying to state { 

New = forward(£;,stim); 
if New ^ W or V 

add New to W; 

} 

} 

} 

The initial state initGState may be generated using the 
following procedure, that produces all possible combinations 
of initial states 7.5.. 

Init(depth.GSfate){ 
Vstate e IS. { 

add state to GState; 
depth = depth - 1; 
if depth = 

ExpandSpaee(G St ate); 

else 

lnit{depth, GState); 
remove last element of GState; 

} 

} 

This procedure is called with the following parameters: 
(a) number of routers n as the initial depth and (b) the 
emptystate as the initial GState. It is a recursive proce- 
dure that does a tree search, depth first, with the number of 
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Fig. 10 

Simulation statistics for forward algorithms. ExpandedStates is 

THE NUMBER OF VISITED STATES. 
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Fig. 11 

Simulation statistics for forward .algorithms. Forwards is the 
NUMBER of calls TO forwardQ. 



levels equal to the number of routers and the branching fac- 
tor equal to the number of initial state symbols \I.S.\ — i.s.. 
The complexity of this procedure is given by (i.s.)". 

B. Reduction Using Equivalence 

We use the counting equivalence notion to reduce the com- 
plexity of the search in 3 ways: 

1. The first reduction we use is to investigate only the equiv- 
alent initial states, we call this algorithm Equiv. One proce- 
dure that produces such equivalent initial state space is the 
Equivlnit procedure given below. 

Equivlnit (S,i, estate) { 
Vstate e S 

for j = i to { 

New = emptystate; 
for fc = to j 

add state to New; 
New = New ■ GState 
S = tianeiS, state); 
if (i - j) =0 

ExpandSpaee(JVeiu); 
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Rtrs 
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Fig. 12 

Simulation statistics for forward algorithms. Transitions is the 

NUMBER of transient STATES VISITED. 



Rtrs 


Error States 


Exhaustive 


Equiv 


Equiv+ 


Reduced 


Reduction 


1 


1 


1 


1 


1 


1 


2 


7 


3 


3 


3 


2.333333 


3 


33 


7 


6 


6 


5.5 


4 


1 91 


21 


13 


13 


1 4.69231 


5 


783 


49 


25 


25 


31 .32 


6 


3235 


115 


43 


43 


75.23256 


7 


1 1 497 


239 


68 


68 


1 69.0735 


8 


41 977 


504 


101 


1 01 


41 5.61 39 


9 


1 421 97 


1 01 2 


1 43 


143 


994.3846 


10 


491 1 95 


2057 


1 95 


1 95 


251 8.949 


1 1 


1 625880 


41 01 


258 


258 


6301 .86 


1 2 


5441 1 77 


8237 


333 


333 


1 6339.87 


1 3 


1 7751 1 78 


1 6425 


421 


421 


421 64.32 


14 


582201 93 


32879 


523 


523 


11131 9.7 



Fig. 13 

Simulation statistics for forward algorithms. The number of 
stable error states reached. 



EquivInit(S,i - j.Afeu;); 

} 

} 

This procedure is invoked with the following parameters: (a) 
the initial set of states I.S. as S, (b) the number of routers 
n as i, and (c) the emptystate as GState. The procedure 
is recursive and produces the set of equivalent initial states 
and invokes the ExpandSpace procedure for each equivalent 
initial state. The 'trunc' function truncates S such that S 
contains only the state elements in S after the element state. 
For example, trunc({F, NM, M}, F) = {NM, M}. 

2. The second reduction we use is during state comparison. 
Instead of comparing the actual states, we compare and store 
equivalent states. Hence, the line 'if New ^ W or V would 
check for equivalent states. We call the algorithm after this 
second reduction Equiv+. 

3. The third reduction is made to eliminate redundant tran- 



sitions. To achieve this reduction we add flag check before 
invoking forward, such as stateFlag. The flag is set to 1 
when the stimuli for that specific state have been applied. 
We call the algorithm after the third reduction the reduced 
algorithm. 

C. Complexity analysis of forward search for PIM-DM 

The number of reachable states visited, the number of 
transitions and the number of erroneous states found were 
recorded. The result is given in Figures The 
reduction is the ratio of the numbers obtained using the ex- 
haustive algorithm to those obtained using the reduced al- 
gorithm. 

The number of expanded states denotes the number of vis- 
ited stable states and is measured simply as the number of 
states in the set V in 'ExpandSpace' procedure. The number 
of forwards is the number of times the 'forward' procedure 
was called denoting the number of transitions between stable 
states. The number of transitions is the number of visited 
transient states that are increased with every new state vis- 
ited in the 'forward' procedure. The number of error states 
is the number of stable (or expanded) states violating the 
correctness conditions. 

The number of transitions is reduced from 0{4") for the 
exhaustive algorithm to 0(n*) for the reduced algorithm. 
This means that we have obtained exponential reduction in 
complexity, as shown in Figure 




1.E+0 i — , — , — , — , — , — , — , — , — , — , — , — , — , — I 
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number of routers (n) 



Fig. 14 

Reduction ratio from exhaustive to the reduced algorithm 

III. FOTG Algorithms 

This appendix includes pseudo-code for procedures imple- 
menting the fault-oriented test generation (FOTG) method 
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presented in Section In addition, it includes detailed re- 
sults of our case study to apply FOTG to PIM-DM. 

A. Pre- Conditions 

The procedure described below takes as input the set of 
post-conditions for the FSM stimuli and genrates the set 
of pre-conditions. The ^conds' array contains the post- 
conditions (i.e., the effects of the stimuli on the system) and 
is indexed by the stimulus. The 'stimulus' function returns 
the stimulus (if any) of the condition. The 'transition' func- 
tion returns the transition or state of the condition The 
pre-conditions are stored in an array 'preConds' indexed by 
the stimulus. 

PreConditioiis{ 

Vstim e T 

Vcond e conds[stim]{ 

s = stimulus(coTid); 

t = tranBition(coTid); 

add t.stijn to preC OTids[s] ; 

} 

} 

B. Dependency Table 

The 'dependency Table' procedure generates the depen- 
dency table depTable from the transition table of conditions 
conds. 

dependency Table { 

Vstim e T 

Vcond e conds[stim] { 

endState = cnd(coTici}; 
stai-tStatG = start(co7id); 

add startState.stim. to d&pT able[&ndState]; 

} 



add St to MinToposlstim]; 



Blsa { 



if flTopo{stm) 

buildMinTopos(stT7T); 
Vtopo e MinTopos[stim] 

add St to MinTopos[stim]; 



D. Backward Search 

The 'Backward' procedure calls the 'Rewind' procedure to 
perform the backward search. A set of visited states V is kept 
to avoid looping. For each state in GState possible back- 
ward implications are attempted to obtain valid backward 
steps toward initial state. 'Backward' is called recursively 
for preceding states as a depth first search. If all backward 
branches are exhausted and no initial state was reached the 
state is declared unreachable. 

Backward (GState){ 

it estate e V 

add GState to V 
Va e GState{ 

bkwds = depTable[s]; 
Vbh e bkwds{ 

Ne-w = Rcwind(bfc, GState, s); 
if New = done 
break: 



cla 



Backward(iVeiu); 



For each state s, that is endState of a transition, a set of 
startState - stimulus pairs leading to the creation of s is 
stored in the depTable array. For s G LS. a symbol denoting 
initial state is added to the array entry. For our case study 
TS. = {NM,EU}. 

C. Topology Synthesis 

The following procedure synthesizes minimum topologies 
necessary to trigger the various stimuli of the protocol. It 
performs the third and forth steps of the topology synthesis 



procedure explained in Section V-B 



build]VIinTopos(atim){ 

VcoTid e prcCoreci3[atsm]{ 
at = Gnd(co7td); 
stm = stimulus(coTid) ; 

it typc(sfm) = orig 

■^^If there's a state in the eondition, this may be viewed as state — > state 
transition, i.e., transition to the same state. 



The 'Rewind' procedure takes the global state one 
step backward by applying the reverse transition rules. 
^Teplace{s,st,G State)' replaces s in GState with st and re- 
turns the new global state. Depending on the stimulus 
type of the backward rule bk, different states in GState are 
rolled back. For orig and dst only the originator and des- 
tination of the stimulus is rolled back, respectively. For 
mcast, all affected states are rolled back except the origi- 
nator. mcastDownstream is similar to mcast except that 
all downstream routers or states are rolled back, while only 
one upstream router (the destination) is rolled back. 

R,ewind(bfc,GStafe,s){ 

if bk e I S. 

return done; 
stim = atiniulua(fafc); 
St = start(?3fc); 
if type(sfi7n) = orig { 
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Total 


Average 




Backwards 


Rewinds 


BackT racks 


Backwards 


Rewinds 


BackTracks 


Unreachable (6) 


223 


586 


293 


37.16 


97.6 


48.8 


Reachable (16) 


23030 


61212 


31736 


1439 


3825 


1983 


Total (22) 


23253 


61798 


32029 


1057 


2809 


1455 



Fig. 15 

Case study statistics for applying FOTG to PIM-DM 



New = rcpla.cc{s, St, estate); 
return New; 

} 

e preconds[stim] &c 
while src not found { 

str = start(cond); 
if str e GState 
src found 

} 

if src not found 

return backTrack; 
if typc(sfiTJx) = dst { 

Nem = rcplacc(s,st,GStafe); 

if chcckMinTopo(Weiu,stiT7T) 
return New; 

return backTrack; 
if not chcckConaiBtcncy(sfi?7i, GState) 

return backTrack; 
New = GState; 
if typc(stiTn) = mcast 

\fcond e co7zds[stim] 

if cnd(cond) e GState &: not src 

New = rcplacc(cnd, start, GState); 
if typE(stim) = mcastDorunstream 
Vcond e conds[stim] 

if end(cond) G GState &c not upstream 

New = rcplacc(end, start, GState); 
else if end G GState &c upstrearn 

New = rcplaccfend, start, GState) once; 
if clieckMinTopo(Weti;, stim) 
return New; 

else 

return backTrack; 

} 

The following procedure checks for consistency of applying 
stim to GState. 

checkConaistency(stim, GState) { 
VcoTid e conds[stim,] & cond has transition 
if start(coTid) G GState 
return False; 

else 

return True; 

} 



The following procedure checks if GState contains the nec- 
essary components to trigger the stimulus. 

checkIVIinTopo(GState.stiT7x){ 

if 3MinTopos[stim] C GState 
return True; 

else 

return False; 

} 

E. Simulation results 

We have conducted a case study of PIM-DM analysis us- 
ing FOTG. A total of 22 topologies were automatically con- 
structed using as faults the selective loss of Join/Prune, 
Graft, and Assert messages. Out of the constructed topolo- 
gies (or global states) 6 were unreachable global states and 16 
were reachable. The statistics for the total and average num- 
ber of backward calls, rewind calls and backtracks is given in 
Figure 

Although the topology synthesis study we have presented 
above is not complete, we have covered a large number of 
corner cases using only a manageable number of topologies 
and search steps. 

To obtain a complete representation of the topologies, we 
suggest to use the symbolic representation Q presented in 
Section Based on our initial estimates we expect the 
number of symbolic topology representations to be approx- 
imately 224 topologies, ranging from 2 to 8-router LAN 
topologies, for the single selective loss and single crash mod- 
els. 

F. Experimental statistics for PIM-DM 

To investigate the utility of FOTG as a verification tool we 
ran this set of simulations. This is not, however, how FOTG 
is used to study protocol robustness (see previous section for 
case study analysis). 

We also wanted to study the effect of unreachable states on 
the complexity of the verification. The simulations for our 
case study show that unreachable states do not contribute 
in a significant manner to the complexity of the backward 
search for larger topologies. Hence, in order to use FOTG as 
a verification tool, it is not sufficient to add the reachability 
detection capability to FOTG. 

The backward search was applied to the equivalent error 
states (for LANs with 2 to 5 routers connected). The simula- 
tion setup involved a call to a procedure similar to 'Equivlnit' 
in Appendix II-B, with the parameter S as the set of state 

■^^We have used the repetition constructs '0', '1', 
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Simulation statistics for backward algorithms 
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Fig. 17 

Complexity of the FOTG algorithm for. error states 



symbols, and after an error check was done a call is made to 
the 'Backward' procedure instead of 'ExpandSpace'. 

States were classified as reachable or unreachable. For the 
four topologies studied (LANs with 2 to 5 routers) statistics 
were measured (e.g., max, min, median, average, and total) 
for number of calls to the 'Backward' and 'Rewind' proce- 
dures, and the number of backTracks were measured. As 
shown in Figure the statistics show that, as the topology 
grows, all the numbers for the reachable states get signif- 
icantly larger than those for the unreachable states (as in 
Figure despite the fact that that the percentage of un- 
reachable states increases with the topology as in Figure 
The reason for such behavior is due to the fact that when 
the state is unreachable the algorithm reaches a dead-end rel- 
atively early (by exhausting one branch of the search tree). 
However, for reachable states, the algorithm keeps on search- 
ing until it reaches an initial global state. Hence the reach- 
able states search constitutes the major component that con- 
tributes to the complexity of the algorithm. 





90 




80 




70 


u 
□) 


60 


n 
c 


50 


u 
u 


40 




Q. 


30 




20 




10 








- Unreachable 

- Reachable 



3 4 
number of routers (n) 



Fig. 18 

Percentage of reachable/unreachable error states using FOTG 



G. Results 

We have implemented an early version of the algorithm in 



the NS/ VINT environment (see littp://catarina.usc.edu/vint) 
and used it to drive detailed simulations of PIM-DM therein, 
to verify our findings. In this section we discuss the results of 
applying our method to PIM-DM. The analysis is conducted 
for single selective message loss. 

For the following analyzed messages, we present the steps 
for topology synthesis, forward and backward implication. 



G.l Join 



Following are the resulting steps for join loss: 



Fig. 19 

A TOPOLOGY HAVING A {Fi, Fj , . . . , F^} LAN 



Synthesizing tiie Global State 



1. Set the inspected message to Graftj 



2. The startState of the post-condition is NF => Gj = {NF} 

3. the endState of the pre-condition is NHff^^ =^ G/ = {NF, N H ^^-^) 

4. The stimulus of the pre-condition is Graftj-^ 

5. The StartState of the post-condition is NH, implied from NHp>^^ in G 

6. the endState of the pre-condition is JV H" which may be implied 

7. the stimulus of the pre-condition is HJ, which is Ext (external) 



without loss: Gj = {NH,NF} '^''t^jT^ Gj_|_i = {N H N F} 
Graftfi^^ Gj^2 = {"^Hti- ^A^'' G;^^ = {NH, F} correct state 

Graftrr^ Turner 
with loss of Graft: Gj = {NH, NF} ^ Gj^^ = {NH^j^, NF} 

Graftrj, Gra/tp^,, 
Gj^4 = {NH^^^, F} ^AS^ Gj_|.5 = {NH,F} correct state 

We did not reach an error state when the Graft was lost, 
with non-interleaving external events. 



H. Interleaving events and Sequencing 

A Graft message is acknowledged by the Graft — Ack 
(GAck) message, and if not acknowledged it is retransmitted 
when the retransmission timer expires. In an attempt to cre- 
ate an erroneous scenario, the algorithm generates sequences 
to clear the retransmission timer, and insert an adverse event. 
Since the Graft reception causes an upstream router to be- 
come a forwarder for the LAN, the algorithm interleaves a 
Leave event as an adversary event to cause that upstream 
router to become a non- forwarder. 

To clear the retransmission timer, the algorithm inserts the 
transition {NH NHntx) in the event sequence. 

Forward Implication 

Gi = {NH,NF} Gi+i = {NHRtx,NF} 

Gi+2 = {NH, NF} error state. 
Backward Implication: 

Using backward implication, we can construct a sequence 
of events leading to conditions sufficient to trigger the GAck. 
From the transition table these conditions are {NHjux , f }P|: 
Gi = {NH,NF} ^ Gi-i = {NC,NF} ^ G7-2 = 

"^^We do not show all branching or backtracking steps for simplicity. 



26 



{NCFoei} Gi-3 = {NC,F} ^ Gi-4 = 

{NHRt.,F}. 

To generate the GAck we continue the backward imphca- 
tion and attempt to reach an initial state: 

ft_4 = {NHnt.^F} ""^^113- Gj-s = {NHnt.^NF} 
Gi-6 = {NH.NF} ^ Gi-7 = {NC.NF} ^ Gi-8 = 
{NC^FDei} "i^" Gi-9 = {NC^F} Gi-io = 

{NM, F} = {NM, EU} = I.S. 

Hence, when a Graft followed by a Prune is interleaved 
with the Graft loss, the retransmission timer is reset with 
the receipt of the GAck for the first Graft, and the systems 
ends up in an error state. 
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